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Using methods and ideas from statistical mechanics and random graph theory, we pro- 
pose a simple method for obtaining rigorous upper bounds for the satisfiability transition 
in random boolean expressions composed of N variables and M clauses with K variables 
per clause. The method is based on the identification of the core - a subexpression (sub- 
graph) that has the same satisfiability properties as the original expression. We formulate 
self-consistency equations that determine the macroscopic parameters of the core and com- 
pute an improved annealing bound for the satisfiability threshold, a c = M/N. We illustrate 
the method for three sample problems: iv~-XOR-SAT, i\~-SAT and positive 1-in-if -SAT. 

PACS numbers: 02.10.Ox,89.20.-a,05.20.-y 



I. INTRODUCTION 

Over the past decade the statistical properties of combinatorial problems has attracted increas- 
ingly greater attention from both the computer science and physics communities JllQ,Q]. Most 
computationally difficult problems encountered in practice belong to the class of NP-complete 
roblems. There is a one-to-one correspondence between these problems and spin glass models 
. Unlike problems with regular structure, many combinatorial optimization problems are for- 
mulated on random graphs and hypergraphs. The long-standing problem in the computer science 
community is "P vs. NP", that is, can NP-complete problems be solved in polynomial time, or 
they are inherently intractable q]? Although the problem is extremely important, it is also deeply 
theoretical as it concentrates on worst-case scenarios. From the viewpoint of practitioners, effi- 
cient algorithms have to be designed with real- world problems in mind. Appropriate test cases can 
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be prepared for comparing the performance of different algorithms. However, this approach does 
not allow the design of algorithms with typical performance in mind, only their comparison. 

One can argue that the purely theoretical study of algorithms was somewhat impeded by ex- 
ploding speed of computers that encouraged experimentation. This state of affairs may be chal- 
lenged by the emerging paradigm of quantum computing. Until a working prototype of a quantum 
computer is built, it exists only on paper. Classical simulations of quantum computers can be done 
only for very small problems, due to speed and memory requirements. Since these reqirements 
grow exponentially with the size of the problem, they could be used only in proof-of-concept sce- 
narios. While quantum computation was shown to be efficient for some classically intractable 
problems (the most notable example being Shor's algorithm whether they provide an ad- 
vantage for NP-complete problems is unresolved. Therefore, designing algorithms with typical 
complexity in mind for quantum computer may be desirable. Whether the newly proposed quan- 
tum adiabatic algorithm is efficient in tackling NP-complete problems is an area of active research 

The statistics of real-world examples is largely unknown. As a first approximation one can 
assume that the problems can be chosen completely at random. The underlying belief is that if an 
algorithm is efficient for a uniform ensemble of randomly chosen problem instances, it will solve 
real-world examples fairly efficiently as well. The performance for random problems is a truly 
unbiased benchmark to compare different algorithms. The same explosion in computational speed 
responsible for diminished reliance on theoretical study has also reignited interest in this type of 
study. 

Many problems of interest are written as a boolean expression (a formula) - a set of N variables 
and M constraints, all which we aim to satisfy. Each constraint is a clause involving K variables 
and it determines which combinations of variables are permitted. The types of constraints differ 
from problem to problem, but for great many the following picture persists: for small a = M/N 
the problem is almost always (that is, with probability 1 in the limit iV — > oo) satisfiable, while at 
a = a c an abrupt change occurs, and for all a > a c the problem is almost always unsatisfiable 111 
0, 13] . An even more interesting phenomenon occurs for the typical running time of the algorithm: 
the time it takes to solve a problem is usually polynomial for a < ad < a c , and exponential 
for a > a,i, where ad is algorithm-dependent. However, independent of the algorithm used, the 
complexity peaks at a = a c , where the probability that the formula is satisfiable is approximately 
1/2. 
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Random satisfiability problems grabbed the attention of the statistical physics community, since 
the phenomenon in question is a phase transition; the study of this phase transition may improve 
the understanding of the physics of random materials such as glasses. This is in addition to any sta- 
tistical properties of the solutions - properties that can be used for the design of efficient classical 
or quantum algorithms. 

The quest for exact values of a c or has so far been elusive. The best results for a particular 
problem - K-SAT - were obtained using the so-called one-step RSB approximation and are in 
excellent agreement with experiment H 1 Ofl . However, the method has drawn criticism because 
the method itself is not well-understood, lacks a rigorous foundation, and the result depends on 
extensive numerical computations. On the upside, rigorous bounds have been obtained for K- 
XOR-SAT (note, however that it can be solved in polynomial time). On the mathematical side, a 
series of results on rigorous lower Jlli and upper llJl bounds on a c appeared recently. Typically 
lower bounds rely on an explicit algorithm and upper bounds rely on the counting of solutions. 
The trivial upper bound is obtained using the annealing approximation. All improvements over 
the annealing approximation employ the fact that at the satisfiability transition the number of 
solutions jumps from the exponentially large number 2 aN to 0. The method we propose in this 
paper does not deviate from this strategy. For any random formula we identify a subformula that 
possesses identical satisfiability properties, but has suppressed fluctuations. That is, if the formula 
is satisfiable, the subformula is also satisfiable but has a significantly smaller number of solutions. 
By performing the disorder average of the number of solutions of the subformula (rather than 
formula, as in the annealing approximation) the point where the average goes to zero determines 
the upper bound on the true transition point. 

The advantages of the method described here are that it is rigorous (it does not rely on any hy- 
potheses, although we supply proofs only when they are not immediately intuitive; it is straightfor- 
ward to rederive all the results with complete mathematical rigor) and that the method is applicable 
to various types of random satisfiability problems. We choose to describe K-XOR-SAT as well as 
the NP-complete problems K-SAT and positive l-in-i^-SAT. Each problem adds its own "touch" 
to the formalism. For the case fT-XOR-SAT - a polynomial problem - the upper bound is exact 
J13I . while the upper bound for K-SAT grossly overestimates the transition. This could be related 
to the fact that i^-SAT is very difficult for classical algorithms. In all cases we take a two step 
approach. In the first step we compute the parameters of the subformula - the core. In the second 
step we compute the annealing approximation for the number of solutions of the subformula. The 
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size of the core also exhibits a phase transition and has been studied for a range of problems Jl^j . 
Our method provides a much simpler way to derive those results. 

The paper is organized as follows. We describe fT-XOR-SAT, f^-SAT and positive l-'m-K-S AT 
in sectionslCTthrough UVt section[V]is devoted to numerical simulations for the positive l-in-3-SAT 
problem; section|Vl|is a summary. 



II. K-XOR-SAT 



In this model the instance of the problem consists of N variables and M clauses, each clause 
involving K variables. Each variable can take values or 1. The ensemble we consider (random 
hypergraph) is that of independent clauses with variables in each clause drawn uniformly at ran- 
dom out of the set of N variables. To each clause we also attribute a number or 1, each with 
probability of 1/2, and posit that the clause is satisfied if the exclusive-or (XOR) of the K vari- 
ables in the clause equals that number. The entire formula is said to be satisfied if all of its clauses 
are satisfied 

The probability that such random formula is satisfied, in the limit N — > oo, exhibits a sharp 
jump from 1 to at some critical ratio of clauses to variables a c = M/N. We attempt to estimate 
this satisfiability threshold. The simplest approximation (in fact an upper bound) uses the first 
moment method (known as the annealing approximation in the physics community). One can 
compute the disorder- averaged number of solutions. The point where the expectation value of 
the number of solutions becomes smaller than 1 corresponds to a formula that is unsatisfiable; 
therefore this serves as an upper bound on the location of the transition. In essence we have 
approximated P(sat) = P[jV ^ 1] = E[9(Af - 1)] by E[N], where Af denotes the number of 
solutions (an integer). In the physics community the annealing approximation for the entropy is 
regarded as the replacement of the correct quantity E[ln A/] by the incorrect expression In E[A/]. 

Computing the point where the annealed entropy becomes zero is trivial. For each clause, the 
probability that the clause is satisfiable is independent of the assignment of variables and equals 
1/2. Therefore the expected number of solutions is 

E[Af] = 2 N 2~ M , (1) 

and the corresponding entropy S arm = A In 2 — Mln2 becomes negative above a u = 1 (the 
subscript indicates that this is the upper bound). 
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A. Concept of a core 

A major drawback of the annealing approximation is in that it fails to account for finite entropy 
at the satisfiability transition. (By accident, for this particular problem, the annealed expression 
for the entropy on the satisfiable side of the transition is exact). It can be argued that at any finite 
connectivity a random graph possesses a large (O(N)) number of variables that do not appear 
in any clauses, thus making a contribution to the entropy which we fail to take into account. 
Furthermore, there are clauses that involve variables, the variables not being in any other clauses, 
as well as small clusters of such clauses. The annealing bound would be significantly improved if 
it were possible to separate these irrelevant contributions to the entropy. 

In a paper devoted to the finite- size effects of the satisfiability transition, a concept of irrelevant 
clauses was put forward. Given a random formula one can always easily identify clauses that can 
be trivially satisfied. The paper did not specify the procedure for finding such clauses, only that 
their number is extensive (O(N)). One example is isolated clauses, since variables can always be 
set so as to satisfy the clause. The presence of such extensive clauses is responsible for the lower 
bound of 2 of the finite-size scaling exponent v, or, in other words, that the disorder is relevant to 
the phase transition. 

One can try to advance the most general definition of irrelevant clause based on local properties 
alone. In fact this has been done for K-XOR-SAT [13]. In essence we repeat the derivation in 
a slightly simplified form, but will generalize it to other problems later on. For i^-XOR-SAT 
we identify variables that appear in no clauses and delete those variables. Further, we identify 
variables that appear in only one clause. Such variables can be set to or 1 (after other variables 
have been assigned) so that the clause becomes satisfiable. Hence the satisfiability of the entire 
formula will be unaffected if the variable and the corresponding clause are deleted. This process 
(known as trimming algorithm, illustrated below, in Fig. [l]) can be continued until we either end 
up with an empty graph (which would imply that the formula is satisfiable) or a core - the formula 
in which all variables appear in at least two clauses. One can compute the annealed entropy on the 
core and use the point at which the entropy becomes zero as the improved upper bound a' u . 

We examine the structure of the remaining core. First, observe that the remaining core does not 
depend on the order in which the variables and clauses are removed. In fact the remaining core 
is the unique maximal subformula of the original formula with the property that every variable 
appears in at least two clauses. The original formula is the core plus all deleted clauses and 
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FIG. 1: Example of trimming algorithm for 3-XOR-SAT. Variables are represented graphically as vertices 
and clauses are represented as triangles. Incomplete triangles represent connections to the remainder of the 
graph (not shown). Lightly shaded clauses are removed by the trimming algorithm. 

variables. Assume that the core has N' variables and M' edges (implying N — N' variables 
and M — M' clauses were deleted). Correspondingly, all original graphs can be divided into 
distinct groups based on values of N', M' . Suppose we keep N' and M' fixed. Observe that to 
every realization of the core there corresponds an equal number of possible realizations of deleted 
clauses, and, as a consequence, an equal number of possible realizations of the original graph (in 
the group labeled by N', M'). Therefore, for any fixed N', M' all possible realization of the core 
are equiprobable - a fact we employ to perform disorder averages. 

Notice that all possible realizations of the core are equiprobable only for fixed N', M' . The 
values of N' and M' themselves fluctuate. However, the fluctuations in N' and M' are on the 
order of 0(\fN) while their respective values are O(N). Since we expect the threshold to be 
sharp as a function of M'/N', we need not concern ourselves with these fluctuations. Therefore 
we concentrate on finding the most likely values of N' /N and M'/N. One approach is to work 
with a set of {c/.} - a fraction of vertices that appear in k clauses. One can describe an algorithm 
as a random process and study the changes in the average values of {c^}. The discrete steps of the 
algorithm are approximated by continuous time t, and a set of {cfc(t)} is replaced by its generating 
function c(t, x) = J2k c k(t)x k . The problem is then reduced to solving the resulting PDE. This 
is the approach taken in 111 311 . Slightly differing variants of this method were also employed in 



|141I15LI16I1 . We instead opt for an approach that does not take dynamics into considerations. The 
approach is inspired in part by work analyzing the matching problem II 1711 . 

In essence, we seek the disordered average of N'/N. This is precisely the probability p that a 
randomly chosen variable belongs to the core. We can also fix a specific variable (say, variable xq) 
and perform a disorder average of a function that yields 1 if that vertex belongs to the core or if 
it does not. For every formula T we can introduce the set C of variables that belong to the core. 
Obviously \C\ = N' = pN. Now, introduce an extension of C, which we denote as C, defined as 
the minimal set that satisfies the following requirements 

1. CC C. 

2. If If — 1 variables in some clause belong to C, then the remaining variable must also belong 
to C. 

It is straightforward to see that set C so defined is unique. Let \C'\ = qN, where q can be 
interpreted as the probability that a random vertex belongs to C. 

Let us turn to the original random graph. The number of clauses in which the variable x 
appears is a random variable distributed according to a Poisson distribution with parameter Ka. 
In performing disorder averages we can first average over all possible disorders with fixed values 
of clauses k first, and average over k with weight e~ Ka (Ka) k / k\ as the last step. Further, observe 
that those k clauses are independent. Let T' denote a formula that is obtained by removing the 
variable x and the clauses in which it appears. Let q' denote the parameter q associated with 
J 7 '. Suppose that for some clause in which x appears, all the other K — 1 variables belong to 
C'\T'\. Then x must belong to C\T\. The probability that for some clause K — 1 variables other 
than x belong to C"[JF'] is (q r ) K ~ 1 . The number of such clauses is, hence, also Poisson, but with 
parameter Kafa')^ 1 . The probability q that x belongs to C\T\ is therefore 

q = jr e - Ka W K ~ l ( KOt W K ~ 1 ) k = 1 _ e -Kct(q')K-l _ (2) 
k=l ^' 

Now observe that T' is essentially a random formula with iV — 1 variables and the same (to within 
0(1/N)) ratio of clauses to variables. Therefore in the limit N — > oo which we are ultimately 
interested in, there should be no difference in statistical properties, and hence q = q'. This leads 
to self-consistency equation 

q = l-e- KaqK ~\ (3) 
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p = I = 1 - (! + K^' 1 ) e~ Kaq ■ (4) 



Note that q = is always a solution to this equation. Since the core is defined as the largest 
possible subformula with certain properties, and the size of the core is directly related to q, we 
must adopt a convention that the largest possible solution to © is always chosen. Below a certain 
threshold only q = is a solution, whereas above the threshold, another q > solution appears. 

We now turn to the original goal of finding N'. If at least two clauses which include x have the 
property that K — l other variables are in C [J 7 '] , then the variable x as well as the aforementioned 
variables are in C[T\. Hence, we can write 

-a (Kaq K - l f 

fc! 

k=2 

To compute M' we examine the average degree (number of clauses in which it appears) of the 
randomly chosen vertex in the core. The latter should equal KM'/N'. If vertex x is in the core 
(with probability p), the number of clauses which are in the core was shown above to be a random 
variable - a truncated (only k ^ 2 are allowed) Poisson distribution with parameter Kaq K ^ 1 . 
Therefore 

KM'/N' = ± far'-*- ( g °^'') t / £ e-wC*^-')', (5) 

fc=2 ' ' fc=2 

Recognizing that the denominator is p = N'/N we can rewrite 

M'/N = ^-y he-****- 1 (gggfll) = f i _ e-*"**- 1 ) = (6) 

k=2 

B. Improved annealing bound 

As with the original annealing bound, we are aided by the fact that clauses require that the 
exclusive-or of the variables be either or 1 with probability 1/2. The probability that a clause is 
satisfied is independent of the assignment of the variables, and the entropy is predicted to decrease 
to zero when M'/N' = 1 or 

aq K = q- Kaq K ~ 1 + Kaq K (7) 

Coupled with 1 — q — e~ KaqK 1 this puts the upper bound of critical threshold at a' u ~ 0.918. 

It is a remarkable feature of A'-XOR-SAT is that whenever it is satisfiable, the number of solu- 
tions of A'-XOR-SAT equals the number of solutions of the corresponding "ferromagnetic" model, 
where we require that the exclusive-or of the variables be precisely in all clauses. Note that for 
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K-XOR-SAT, this is not so; the ferromagnetic model always possesses at least one solution. The 
next observation is that the disorder average of the square of the number of solutions of K-XOR- 
SAT E[A/" 2 ] equals 2 N ~ M multiplied by E[A/] as computed for the ferromagnetic model. As long 
as the annealing bound for the ferromagnetic model equals that for K-XOR-SAT we can be sure 
that the annealing bound is correct and we are in the satisfiable phase. The point at which it ceases 
to be so is the lower bound on the satisfiability transition an. 

Finding the annealed entropy for the ferromagnetic model on a complete graph is trivial and 
amounts to finding a maximum of 
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+ Mlnl±^. (8) 



For as long as m = is a global maximum of this expression, the annealed entropies of the 
ferromagnetic and random models are equal. It ceases to be so at a\ ~ 0.889, which serves as a 
lower bound on satisfiability transition. It is worthwhile to compute the annealed entropy on the 
core. That task has been accomplished in Jl^l . We rederive the results using a different method 
which can be readily generalized to other problems. 

The annealed entropy is simply the difference between In M S) j and In A/j, where AO is the num- 
ber of possible disorders, and M s ,j counts the total number of disorder configurations and variable 
assignments compatible with the disorder. For simplicity, we decide to distinguish between disor- 
ders that differ only by permutation of clauses and permutation of variables within clauses. Any 
double counting in Aj due to this convention will be exactly canceled by identical factor in Af S; j. 
The advantages are especially evident for the case of the original random graph. We can immedi- 
ately obtain Mj = N 3M . The expression is more complex when restricted so that the degrees of 
all variables are at least 2. We now investigate it closely. We introduce a set {cfc} where k is a 
vector with K components {k p } that count the number of clauses in which some variable appears 
in p-th position. The quantity c& is the fraction of variables described by vector k. One trivial 
constraint is that J2k c fc = 1- One can represent disorders as an M' x K table of numbers from 1 
to N'. We can divide the variables into various classes according to the value of k. The number of 
all possible permutations is the product of two factors. 

1 . N'\/ Y[ k N' k \ for the number of ways to arrange the variables into the various classes. 

2- lip / rifeC^pO^ f° r tne number of ways to rearrange the variables in the M' x K 
table. 
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In general we ought to perform a sum over all possible values of N' k . However, the sum is domi- 
nated by particular values of N' k that maximize the entropy (N' k = N'c k ) 

Sj[c k ] = -N'J2^c k + K(M']nM' - M') - N'J2ck(j2 lnk ^ 

k k p 

= [cfc] + K(Af In M' - M') . (9) 
Note that we have K constraints on c k 

Y J K^k = M ! /N\ (10) 

k 

and that we require c k = if J2 P k p < 2. 

Maximizing Sj [c k ] is easiest if we work with its dual transform. Let {—In jj p } be dual vari- 
ables associated with constraints (fTOb . Instead of finding 

Sf\N\M^ = iw>xlsP[c h ] J2 k P c k = M'/N'\ (11) 

^ k 

we compute 

S? KM] = min | - ^ M; In ^ - [N', {M' p }} V (12) 

where Sf^ [N', {M p }] denotes [c k ] maximized under the constraints ^2 k k p c k = M' p /N'. After 
simplifications 

Sf [{fi p }} = mm | - N' c * k P ln + N ' S Cfc ln [ Cfc II fc f ! ] } (13) 

fc p k p 

Optimizing this with respect to c k under the constraint J2 k °k = 1 and c k = for |fc| < 2 yields 

^ 1) [{M] = -^'lnG'(^/i p ), (14) 

p 

where G(x) = J2k>2 %k 7^' = e x — \ — xh the generating function of the ensemble. Reverting to 
original variables is easy. Via the dual transform we obtain 

Sf [N>, M'\ =min{-M'X: In ft, + S? [{fi,}) } , (15) 



v 



and for ln Nj we obtain 



Sj[N',M'} =m.m\M ! Vln — + JV'lnGfYVH -KM'. (16) 
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Clearly, the minimum is permutation-symmetric: /i p = fi/K. Equivalently, 

Sj[N',M'] = min I KM' \n^^ + N'\nG(i2)\ -KM'. (17) 
i* I A* J 

Comparison with ©, © gives /i = Kaq K ~ 1 . Note that substituting G(a;) = e x (meaning no con- 
straints on degrees of variables) gives fi = KM' /N' and Sj[N', M'\ = KM' In N' as expected. 

We now turn to computing the logarithm of J\f s> j. Binary variables can take values of and 
1. Since these can be mapped onto +1 and —1, with exclusive-or replaced by a product, from 
now on we shall succinctly refer to values taken by variables as + and — . For each realization of 
disorder and variable assignment, we ascribe a type a to each clause according to the values of the 
variables inside that clause; cr is a vector with K elements with a p e {+, — }. For the time being 
we fix the number of clauses of each type M' a (remember that M a = unless Y[ p & P = + f° r the 
ferromagnetic model). In addition to its value s 6 {+,—}, we ascribe to each variable a vector k 
of K2 K elements; fc£ denotes the number of clauses of type cr in which that variable appears in 
p-th position. Having fixed M' a and N' s k we discover that the contribution to M 8 ,j is given by the 
product of 3 factors: 

1 . N'\/ J2 S k N' s /J for the number of ways to arrange the variables into classes. 



2. ru Kvru(A£Q^ 



Lp, 

clauses. 



for the number of ways to rearrange the variables within the 



3. M'\ J Yin M'J. for the number of ways to assign types to clauses. 
The associated entropy 

S s , j [CsA = ~N'J2 c ^ ln [ c *,fc II k "] + K H ( M - ln M - - M -) +M'lnM'-J2 K In M' 

s,k p,cr cr cr 

(18) 

is to be maximized under the constraints 

J2Kc s , k = M' a (19) 

s,k 

and the requirement that c s k = unless J2 P CT ^ct ^ 2. Another important constraint is that unless 
cr p = s, we require k^. — 0. 
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The optimization of the first part of the entropy is best accomplished through the use of the 
dual transformation. The dual parameters are { — In }: 



= min -£M£m^ 



max { - N' J2 In L k J] *d I £ c i)fc = M£ 

l c s,fc J ^ , - _ 



s,k 



P,<T 



After simplifications we can rewrite 



5$[04}] = JVln G<(V 



1 + (7 



(20) 



(21) 



The argument of the first G is a sum restricted to <r p = +, and the argument of the second G is a 



sum restricted a p — — . It is convenient to introduce 

El ± Op 
— 2~ *C 

Reverting the dual transformation we can obtain 



(22) 



\-a, 



V.) 



p,<7 



(23) 



(24) 



KM' 



W} L P,<t ^ L P,<t 2 7 V p, 

-/\.U' • M'ln M' M 'cr Id AC 

It is convenient to define 

P,(T 

The expression for the entropy can be simplified to 

SsjIN'^M'J] = wm\M+hi^ + M-\n — + \n[G(ii + ) + G( f ji- 

+M' In M' - ^ M; In M'„. 

The expression for the annealed entropy S ann = S SjJ — Sj thus reads 

S mn [N', {M;}] = min <^ M+ In J —± + M- In — + ln[G( A t + ) + 
''•I /'• /' 

f KM' 1 
- min <^ i^M' In + In > + M' In M' - ^ M'„ In M^. (26) 

This expression has to be maximized with respect to M'^. As a first step, we would like to maxi- 
mize the third term s£& = M' In M' - J2 a M' a In M' a keeping M' and M + - M- fixed. Its dual 
is 



(25) 



S®(h) = mm | - - M-) - M' In M' + ^ AC ln ^C 



AC = M'j 



(27) 
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Let to- G {0, 1} determine whether the clause of type cr is permitted (e CT = 1) or not (e a = 0). For 
the ferromagnetic model e a = 1+ Hp CTp , For Sill we obtain 

S£>(h) = -M'lnJ2eJ^> = -M-ln (2C ° Sh/l) " + (2sinh/l) " , (28) 

cr 

and the original Sann(-M + , M.-) is given by 

S^LW+,.M_] = mm{ -/i(Al + -AC) +MMn^e CT e( E ^>}. (29) 

It is convenient to parameterize M.+ and M._ by a single parameter (*M + + 7W_ = KM' is a 
second constraint). We can arbitrarily choose h as such a parameter 



(30) 



AC = M ' K ~ d/dh ln^e^K (31) 
2 

For the case of the ferromagnetic model this becomes 

• M± -™ 6 (2 cosh ^ + (2 sinh h)« ■ 02) 

Subsequently, we compute S'ann as a function of h and maximize the expression with respect to 
h. For our special case we obtain that h = gives the maximum to the expression as long as 
M' < N'. For h = 0, takes a particularly simple form S mn = N'\n2 - M'ln2. Note that 
this is precisely the annealed entropy for if-XOR-SAT. Therefore, the annealing approximation is 
correct up to M'/N' = 1, and the corresponding connectivity of the original graph a 0.918 is 
both an upper and a lower bound, i.e. the exact answer. 



III. if -SAT MODEL 



An instance of K-SAT is a set of M clauses, each clause consisting of K literals, where the 
literal is either one of N variables Xi or its negation Xi, each with probability 1/2. The clause is 
satisfied if at least one of the literals is 1. Using boolean logic clause can be written as an "or" of 
literals, e.g. x\ V x 3 V x A . A formula is satisfied if all of its clauses are satisfied. For randomly 
generated formulae, a satisfiability transition as a function of M/N occurs for some critical ratio 
a c = M/N. The exact location of this phase transition is a major open problem. 
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A trivial upper bound is given by the annealing approximation. Notice that the probability that 
a random clause is satisfied is independent of variable assignment and equals 1 — 2~ K . Corre- 
spondingly the annealed entropy 

lnE[A/] = iVm2 + Mm (l — 2~ K ) . (33) 

The annealing bound (where the annealed entropy is 0) is hence —1/ log 2 (l — 2 _A ). For K = 3 
this gives an upper bound of a u ~ 5.191, whereas numerical evidence places the transition at 
a c « 4.2. 

A. Core for K-SKT problem 

Here the structure of disorder is more complex compared with the ferromagnetic model since 
variables can appear both positively (x) and negatively (x). To identify irrelevant clauses we use 
the pure literal heuristic. Variables that appear only positively or only negatively can be set to 1 
or 0, respectively, to satisfy those clauses. Removing such "pure" literals together with clauses in 
which they appear for as long as possible (as usual, we also remove variables that appear in no 
clauses) yields a much smaller graph - a core (see Fig. |2] below). Moreover, by the same logic, 
all cores with the same number of variables N' and clauses M' and the condition that all variables 
appear at least once positively and at least once negatively, are equiprobable. We now turn to 
the subproblem of finding the expectation values of N' and M' as a function of a = M/N that 
characterized the original random formula. 

As before, we use the notation p = N' /N - the probability that a randomly chosen vertex 
belongs to the core. The set of variables in the core is denoted as C. We now introduce two 
different extensions of this set: C and C' - the minimal sets with following properties 

1. C C C'andC C C . 

2. If for some clause, K — l variables have a certain property, so should the remaining variable; 
the property being that the variable belongs to C if it appears positively or belongs to C if 
it appears negatively. 

We also reserve the notation q = \C'\ and q = \C'\. Also observe that C = C fl C . 

Fix a variable xq. It appears in k clauses positively (as xq) and in k clauses negatively (as xq). 
The numbers k, k are independent random variables distributed according to a Poisson distribution 



15 




FIG. 2: Example of the trimming algorithm for 3-SAT. Variables are represented graphically as vertices and 
clauses are represented as triangles. Signs "+" and "-" in triangles indicate whether the variable appears 
positively or negatively. Incomplete triangles represent connections to the remainder of the graph (not 
shown). Lightly shaded clauses are removed by the trimming algorithm. 

with parameter Ka/2. We assume that q and q for the full formula T are not different from q' 
and q' for the formula T' with variable x deleted. Dropping primes we can write self-consistency 
equations for q, q: 

, k 



oo oo 



= sr^sr^ -Ka(^) K U a 2 ) \ 2 a _ 2 ) = 1 



fe=0 k=l 
oo oo 

k=l k=0 



C 2 



Kot( 4 s ) \2 a 2 ) \2 a 2 )_ = 1 



k\ k\ 

Obviously q = q and a simpler equation could be written 

q = 1 — e 



e 2 



-m 1 



(34) 
(35) 



Ka K-1 
2 t 



(36) 



Notice that this is identical to © with the replacement a — ► a/2. As a consequence, the core 
appears at exactly twice the threshold for K-XOR-SAT (for 3-XOR-SAT the core appears at a ~ 
0.818, and for 3-SAT it appears at a ~ 1.636. This threshold was obtained earlier (by a different 
method) in one of the first papers on lower bounds for the satisfiability transition in 3-SAT.) 
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To find p, the sums have to be restricted to k > 1 and k ^ 1 thus giving p = qq. Hence 

N'/N = q 2 (37) 
To find M'/N we need to count the average degree of the variable in the core 



fc=l fc=l 

= a(i±I)^ 1 (l-e-M^r 1 ). (38) 



Simplified, this becomes M'/N = aq 



B. Improved bound for if-SAT 

Now that the remaining clauses are correlated, the annealed entropy for K-SAI is not as easily 
computed as for if-XOR-SAT. The technique parallels one used to find the annealed entropy for 
the ferromagnetic model. We need to find the logarithm of the number of disorders Afj and the 
logarithm of the number of spin-disorder combinations ftf s ,j- In contrast to fT-XOR-SAT, clauses 
acquire a type r - a vector, elements of which determine whether the variable in p-th position 
appears inverted or not (r p e {+, — }). Correspondingly, a vertex degree is now a vector k with 
elements k v T describing the number of appearances of a certain variable in the p-th position in 
clauses of type r. We fix the number of variables with given k: N' k (corresponding fractions are 
c fc = N' k /N'). The number of disorders for fixed {N' k } and {M' T } is composed of the following 
factors: 

1. N'\ I Y[ k N' k \ for the number of ways to divide the variables into classes. 
2- lip t M' T \ I Y\k^V) N ' k f° r me number of ways to rearrange the variables among clauses. 
3. M'\ j Y\t M' t \ for the number of permutations of clauses of various types. 
Taking the logarithm, we obtain 

Sj[{c k }} = —N' J2 c fc ln [ c fc II ^ ! ] + K J2 ( M 't ln M - - M -) + M'\nM'-J2 K ^ M' T . 

k p,T T T 

(39) 
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We must optimize over c kk taking into account the constraint that c fc = if either |fc| = or 
\k\ = 0. We also have constraints N' ^ fc k v T c k = M' T . Introducing dual variables and a generating 
function G(x, x) = (e x — 1) (e x — 1) we can write 

S[N', {M' T }} = min ( E M x m + N' In G(fi, p) \ + M' In M' - E K ln K - KM ', 



where 



(40) 



H = 



E (41) 



2 



2 



Also introducing the quantities 



M = Y, (43) 



2 



-M = E V^ M - (44) 



2 



we rewrite S'j as 



SjfW, {M;}] = min j M ln — + .M ln — + N' ln G(fi, p) \ + M' ln M' - V M; ln M;. 

(45) 

For convenience we will write p) = Gi(fi)Gi(p,). where = e x — 1. One can verify 

that S'j is maximized when = M'/2 X and // = // = 4f g^" 1 . 

Now we need to evaluate jV S) j. This time the clauses are parameterized by r - the appearance 
of literals in a clause as well as cr - the particular assignments of variables. We fix {M^} as 
well as {N' k }, with k being a vector with K2 2K elements: k^ T is the number of appearances of 
a variable in clauses of type (cr, r) in the p-th position. The number M s ,j can be broken into the 
following factors 

1- N\/U k N' k \ 
3. M'\/Y[^M' aT \ 
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Enforcing constraints N' J2 S k k& T c s ,k — M' aT as well as constraints on the vector k, i.e. that 
^ a T ^-^-k^. T ^ 1 and that c S)k = if for some p, <x, r k^. T > and a p ^ s, we are able to cast 
the expression for S s j[N', {M^ T }] in a simple form 

ML. 



S s ,j[N\{M' aT }} = mini £ M^ln -fL + ln[G(» + , /2 + ) + /2_)]} 

p,cr,T 

+M' InM' M 'vt ln Kr - ^ M '> (46) 



where 



M ± = E (47) 



2 2 



E (48) 



2 

P,<T,T 



Also introducing 



M ± = E ^H^C. (49) 

P,<X,T 
P,<T,T 

we can rewrite the first part of S s j as 

S^N', {M'}\ = min { M + ln ^± + M- ln — + M + ln ^± + M- ln — 

/U±,M± L yU+ yU+ 

+iV'ln[G(/x + ,/2 + )+G'(/x_,/2_)]} (51) 

Next, we optimize the expression M' ln M' — y^ CT T M£. T ln M^. x subject to fixed .M+, .M-, M.+, 
Introducing dual variables —h, —h', —h" coupled to M + — M.- — M + + M + — M.- + 
M + — M- and M + + M- — M + — M.- respectively, the optimized expression becomes 

S®j[N', M] = min { - h(M+ -M--M+ + M-)- h'(M+ -M- + M+-M-) 

-h"(M+ + M- -M+-M-)- M'\nJ2^^ EpapTp " >h+ ^ Epa ^ h ' + ^ pT ^ h "} 

(52) 

where G {0, 1} determines whether the clause is permitted. For the case of K-SAT we only 
prohibit combinations JT 1+ ^ pTp = 1. We can express M. + , M.- in terms of h,h' 

and h" and substitute into (BTb . Consequently, maximization over h, h', h" will be performed. 
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It can be shown that the maximum necessarily corresponds to h! = h" = leading to further 
simplifications: 

ea-re^p °P T ») h = (2 cosh h) K - e Kh , (53) 



<t,t 

and we can show that M + = M.- and M_ = M.+. (As a result /2 + = /i_ and /2_ = 

+ AT' In d (//) } + min { 2M _ In — " 



S ann = max ( + min {2A4 + In ± + iV' In d (//) } + min { 2M^ In + N' In G x (a) } 

-2 min ( In ^ M ^ + AT' In d (p) \ + N' In 2 - ATM' In 2 

C [ 2 yLi J 

-/i(27W + -27W_) + M / ln[(2cosh/i)^-e x/i ] }, (54) 



where .M± are the functions of h: 



■2M± = ^M' ± IAi n ^ e<TT e( s P ^^) h , (55) 



ITT 



or, substituting for K-SAT 



, Co rosh Z?)^" 1 - e( K -V h 

^ - K "* *T^«-*» • <56) 

We also verify that the maximum of the complete expression corresponds to h = 0. As a result 



5 ann = min |2>f + ln^^ + Ar / lnGi(//)l+imn(2A^_ln^-^ + A^ / lnGi(^) 
v { n J ^ I A* 

-2 min ( In KM '/ 2 + AT' In G x (a) \ + N' In 2 + M ' In [l - 2~ x l (58) 

M { 2 yU J 

Solving ^ann = translates into an upper bound for K = 3 of a' u ~ 5.189 - a rather insignificant 
improvement over straightforward annealing approximation. 



IV. POSITIVE 1-IN-K-SAT MODEL 

In this model, we have a set of clauses, each clause involving K variables that can take values 
or 1. A clause is satisfied if the sum of values of variables is exactly 1. A formula is satisfied if all 
the clauses that constitute it are satisfied. A related problem was considered in |Q] in the context 
of the quantum adiabatic algorithm, which served as the main motivation for present analysis. 
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For a randomly generated formula, the satisfiability transition occurs for some critical clause-to- 
variable ratio a = M/N. The easiest upper bound is obtained using the straightforward annealing 
approximation. For the logarithm of the expected number of solutions we obtain 

, ^ r . „ ( „ T /l + m, 1 + m 1 — 771, 1 — m\ r 1-m /l+m\ 2 il 

InEM = max | - N(— In — + — In— ) + Mln [ 3 — [— j ] }. 

(59) 

where we identified m = 1 with having all variables assigned a value of 0, m = —1 with having 
all variables set to 1, and intermediate values of m being the appropriate mixture. 

The annealed entropy becomes at the critical threshold a u ~ 0.805. We now seek to improve 
upon this simplistic approximation. 



A. Core for positive 1-in-X -SAT 

The structure of the core for positive 1-in-fC-SAT is more complex than what we have seen 
before. As before variables of degree are eliminated. Similarly variables of degree 1 are removed, 
although we are no longer justified in removing the clause in which variable appears. Instead, the 
corresponding ^-clause has to be replaced with a (K — 1) -clause. The latter is deemed to be 
satisfied if the sum of variables in it is either or 1. Then the remaining variable could always be 
set to either or 1 so that the sum of all K variables is exactly 1. Similarly, if any variable has 
degree 1 and appears in a (K — l)-clause, the latter can be converted to a (K — 2)-clause and so 
on. For all clauses of length less than K, the criterion for satisfiability is that the sum of variables 
be either or 1. Finally, we identify variables that appear only in 2 clauses. Setting any such 
variable to will satisfy all 2-clauses. Thus, such variables and clauses in which they appear can 
be eliminated. This process continues until we are left with a subformula where the degree of each 
variable is ^ 2 and no variable appears only in 2-clauses (see Fig. 

For any fixed N' and a set of {M' k } (with k = 2, . . . , K) - the number of clauses of length k 
- all subformulae that satisfy aforementioned constraints are equally probable. The values N' /N 
and {M' k /N} are self-averaging and their means will be computed shortly. 

As before, we introduce the following notation. C denotes the set of variables that belong to the 
core. In addition to C we introduce sets C' 2 and C . The sets shall have the following properties: 

1. CQC'C C 2 . 

2. If 2 variables in some clause belong to C, then all variables in that clause belong to C. 
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FIG. 3: Example of the trimming algorithm for l-in-4-SAT. Variables are represented graphically as ver- 
tices and 4-, 3- and 2-clauses are represented as rhombi, triangles and edges correspondingly. Incomplete 
polygons represent connections to the remainder of the graph. The figure depicts evolution of part of the 
graph under the trimming algorithm. 

3. If 1 variable in some clause belongs to C' 2 , then all variables in that clause belong to C 2 . 

We reserve the notation p = \C\/N,q— \C'\/N and q 2 = \C 2 \/N. As before, we single out a single 
variable x and study the probability that the variable belongs to classes C, C or C 2 . The number 
of clauses in which the variable appears is Poisson with parameter Ka. The variable x is in C 
if for at least one clause in which x appears at least two variables among the K — 1 remaining 
variables belong to C 2 . 

q = l _ exp [-Ka (l - (1 - q^' 1 - (K - l)q 2 (l - q 2 ) K ~ 2 )] , (60) 

where we have used the fact that the probability that among randomly chosen K — 1 variables the 
probability that at least two belong to C 2 is 1 - (1 - q 2 ) K ~ 1 - (K - l)q 2 (l - q 2 ) K ~ 2 ■ 

The variable x is in C' 2 if for at least one clause, at least one variable among the other (K — 1) 
variables belongs to C or at least two variables belong to C 2 . The probability of that is 1 — (1 — 
q 2 ) K ~ l — (K — l)(q 2 — q)(l — q) K ~ 2 - The second self-consistency equation is thus 

q 2 = 1 - exp [-Ka (l - (1 - q 2 f~ l - (K - l){q 2 - q)(l - q 2 ) K ~ 2 )] ■ (61) 

Consider clauses in which the variable x appears. Let us call those clauses in which at least two 
variables appear in C' 2 type-1 clauses, and those clauses in which one variable belongs to C - type- 
2 clauses. Variable x is in C if it appears in two or more type-1 or type-2 clauses, and at least one 
type-1 clause. Therefore, we should have 

p = 1 - e~ Kapi - Ka Pl e~ Kap2 , (62) 
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where 

pi = 1 - (1 - q 2 ) K - 1 ~{K- 1)^(1 - q 2 ) K ~ 2 , (63) 
P2 = 1 - (1 - Q2) K - 1 -{K- l)(q 2 - q)(l - q 2 ) K ~ 2 . (64) 

To find the number of /c-clauses in the core M' k , compute the average A:-degree of variable x , i.e. 
the number of /c-clauses in which it appears. We readily obtain the following formulae: 

M' 2 /N = ^y q 2 (l-q 2 ) K - 2 , (65) 
M'JN = (J^J aq k 2 (l - q 2 ) K ~ k , for k > 3. (66) 

B. Improved bound for positive 1-in-i-C-SAT 

As before, we compute A/j - the number of disorders, subject to fixed N' and {M' k }, under the 
condition that each variable has a degree of at least two, and that no variable appears in 2-clauses 
exclusively. Introduce a vector of length K — 1 of vertex degrees ( k 2 , . . . , k K ) , with elements being 
the number of fc-clauses in which the variable appears. We prohibit vertices withj^ 3 ki = or 
J2f= 2 ki = \. The corresponding generating function 

G ^ = E ft = eE - 2 " -e* 2 -f> ( 67 ) 

{fej i=2 l ' i=3 

It is convenient to write G{x) = G 2 (x 2 , J2k=3 x k^j > where G 2 (x, y) = e x+y — e x — y. 

We proceed to counting the number of disorders with fixed N' and {M' k }. It is convenient to 
introduce the quantities N' k2 kK that count the number of vertices; indices kf being the number of 
appearances in p-th position in a clause of length i. Starting from 

K 

Sj[c k2 ... kK ] = —N' c ^...k K In [c k2 ... kK J] + E k ^ lnM * - ^ 

k,2...kii i,P k=2 

and optimizing over c k2 ... kK subject to constraints on degrees as well as the set of constraints 

N' k i c ^-..k K = Mi (69) 

k 2 ...k K 



we obtain 



kML 



Sj[N', {M' k }} = miri { ^ kM' k In — * + N> In (7(0*})} - £ kM' k (70) 
1 ^ k J k^2 
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Using the relation G({fj, k }) = G 2 (// 2 , J2k=3 fa) rewrite 

Sj[N',{M' k }} = min {<IM' 2 In ^ + f V fcM^ In ^= 3 fcM * + N' In G 2 fc 2 , //) 

-J2 kM 'k- (71) 

fc=2 

Now, compute the total number of disorders and variable assignments compatible with them. Now 
the clauses of each length have to be subdivided into types cr 2 through cr K , according to the 
variable assignments in the corresponding clause. We arrange variables into classes according to 
their value s G {+, — } and a vector (k 2: ■ ■ ■ ,k K ), with &£. being the number of appearances of a 
variable in a clause of length i and type <Tj in p-th position. The number M s ,j is given as a product 
of three factors 

1. N'\ I Y[ s k 2 k K N's,k 2 ...k K ' f° r me number of ways to rearrange the variables into classes 

for the number of ways to rearrange vari- 



2. nf =2 n P ru [k:-/u. m ..m k w) n '^ 

ables inside the clauses. 
3- Tl?=2 iM'j}. I J2a-i M' ai !] for the number of ways to rearrange clauses. 
For the entropy we obtain 

S s ,j[csM.-k K } = ~N' c ^-k K In [csM-kK II + Y t i (K i ^K i ~K i ) 

s,k2-..k K i,<7i,P 

+ Y,M[\nM' l -Y, K t ln K< ■ (72) 

i i,<7i 

We must note the constraints 

N' = M l, (73) 

k 2 ...k K 

as well as constraints on variable value ^ =>• of = s) and on the degrees of the variables 
(J2?=3 \ ^ 1 an( l J2f=2 ^ 2). With the aid of the generating function and the dual variables 



we can write 

ML 



S, tJ [N', {M k }\ = min £ In —p- + N' In [G ({fi k+ }) + G ({/j, k . 



-})] 



+ M > M * " E M; fe In - £ O^, (74) 

k k . fx^. /t? 
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where we have written ii k ± = Y. P ,* k "T^*- Also ' introducing M k ± = Y. P ,a k ~^ M 'cr k the 
first subexpression can be simplified to 

S%[N', M] = min { J2 (M k+ \n^++M k - In +N' In [G ( W) + G (W-})] ), 

(75) 

and using G 2 can be rewritten as 



S^j[N',M]= min {M 2+ \n^ + M 2+ \n^+(YM k+ )\n 



( ^ A4 fc _) In ^ fc=3 _ fc ~ + In [g 2 (/x 2+ , »-) + G 2 (/x 2 _, //_)] | (76) 



fc=3 



In correspondence with the different treatment afforded to 2-clauses and /c-clauses for k ^ 3, we 
introduce two fields — h 2 and —h coupled to M. 2+ — M. 2 - and J2k=3 (-M k+ — M. k -) correspond- 
ingly. The dual of second part of S s , j is 

r K 

S?][h 2 ,h\= min -h 2 (M 2+ -M 2 -)-hy2(M k+ -M k -) 

- £ (Mi In - £ M; fe In M^) } . (77) 



Note that fork = K only ^ p o# = K - 1 is allowed, while for k < K,^ o% = K and 
57J P & k = K — 1 are both allowed. After proper minimizations we obtain 



K-l 



Sf}[h 2 , h] = M' 2 In (2 + e h2 ) + J2 K m {ke {k ~ l)h + e fc/l ) + M' K In (ATe™) . (78) 

fc=3 

We can express { M^ } in terms of h 2 and h via 

M 2 ± = M' 2 ±\-^S^[h 2 M, (79) 
^M fc± = ^^i--^ 2 ]^,/,]. (80) 

fc=3 fc=3 
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The entire expression for the annealed entropy is then written as 



S ann [h 2 ,h] = min M 2+ \n^± + M 2+ \n^+(T M k+ ) In ^ Mk+ 

K s-^K «i 

fc=3 ^~ 

- min ( 2 M 2 In ^ + ( V fcMjf) In £*=3 fcM * + iV' In G 2 (/i 2 , //) 1 
-/i 2 (.M 2+ - M 2 _) - h ^(M k+ - M k -) 



fc=3 

K-l 

+M' 2 ln(2 + e h2 ) + Y, M 'k h 0e (fc " 1)/l + e fch ) + In (ife*' 1 ) (81) 

A:=3 

Maximization over /i 2 , ft, and solving S ann = gives an upper bound for the satisfiability transition. 
For K = 3 we obtain a u ~ 0.644. This compares favorably to a c ~ 0.625 observed in simulations 
and beats the previous best upper bound of a u « 0.727 [19]. 



V. SIMULATION RESULTS 



In this section we present experimental results on random positive l-in-3-SAT instances. Us- 
ing the Davis-Putnam (DP) algorithm (see Appendix |A|) we study the crossover point and the 
computation complexity. We also identify experimentally the position of the phase transition. 



A. The Crossover Point 



The major feature of a phase transition in a satisfiability problem is the presence of a threshold 
in a, below which almost all random problem instances are solvable, and above which almost 
no random problem instances are. Figure [4] shows a plot of the proportion of random problem 
instances that have a satisfying assignment, versus a, for various values of N. The proportions are 
based on running the DP algorithm on 50,000 random problem instances for each value of N and 
a. The expected features are present. The sharpness of the phase transition increases with N, and 
the point at which the curve crosses the line where the proportion of instances with a satisfying 
assignment equals 0.5 decreases with N. 

Experimentally the crossover point is at a c ~ 0.625, slightly lower than the upper bound of 
a u ps 0.644 computed in section |IVJ In figure [5] (lower curve) we plot the value of a for which 
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a 



FIG. 4: Proportion of problem instances with a satisfying assignment 

50% of the problem instances were satisfiable as a function of the number of bits. The curve 
appears to have an asymptote around a c « 0.625. 



B. Complexity of the Davis-Putnam Algorithm 

Figure |^1 shows plots of the median complexity of the Davis-Putnam (DP) algorithm (complex- 
ity is defined as the number of calls to the function Find_Model displayed in Table 0]). The 
median was taken over 50,000 random problem instances. As expected, because the DP algorithm 
is complete, its performance scales exponentially with problem size, N. Note also that the value 
of a for which the maximum complexity occurs is above a c , and slowly reduces as A" increases. 
In figure |5] (upper curve) we plot the position of the maximum complexity and its uncertainty, 
we note that for the range of values of A" considered, it does not appear to have converged to an 
asymptote, but the curve does not appear to contradict our earlier result of a c 0.625. 
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FIG. 5: Top curve: plot of the maximum complexity of the DP algorithm. Lower curve: the position of the 
crossing point of proportion with satisfying assignment = 0.5 

Fitting an exponential law to the peak complexity gives C = 6.13 exp(0. 0067 x N), a very 
slow rate of increase - an order of magnitude slower than reported results on the complexity of DP 
applied to 3-SAT U. 

VI. SUMMARY 

In this paper we have proposed a new method for analyzing subgraphs (subformulae) of the 
random graph (formula) subject to simple geometric constraints. For every constraint satisfaction 
problem one can identify a core - a subformula that is satisfiable if and only if the original formula 
was satisfiable. In fact simplifying the original formula is typically a first step before applying 
general-purpose algorithms such as the Davis-Putnam routine or simulated annealing, and the best 
algorithms use it. This may become an essential tool for the analysis of "smart" algorithms that 
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FIG. 6: Computational complexity of DP 

perform transformations on the instance of the problem or even on intermediate steps. We have 
also applied the methods used in the present paper for the approximate analysis of the quantum 
adiabatic algorithm for positive 1-in-f^-SAT problem J2I I. 

We have also tried to estimate the satisfiability transition from the above for three problems: 
if-XOR-SAT, A"-SAT and Positive l-in-if-SAT. The results for K = 3 are as follows: a u « 0.918 
for A-XOR-SAT (exact), a u w 5.189 for K-SKT (vs. a c w 4.2 experimentally) and a u « 0.644 
for positive 1-in-iT-SAT (vs. a c ~ 0.625 experimentally). 

The bound for K-SAT was an insignificant improvement over the annealing approximation 
despite deleting irrelevant clauses that contribute to the entropy. Results for A-XOR-SAT and 1- 
in-K-SAT were quite good. Note that random 1-in-fT-SAT (where variables may appear in clauses 
either positively or negatively with probability 1/2, akin to K-SAT) is quite simple. The satisfi- 
ability transition coincides with percolation, and algorithms solve the problem very efficiently in 
the satisfiable phase. A precise way to state this is that the dynamical transition coincides with the 
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satisfiability transition, shrinking the difficult region. This is not the case for positive 1-in-i^-SAT 
that we consider, where most likely a d < a c . 

That the annealing approximation for the simplified formula fails to predict the correct tran- 
sition suggests that a large number of solutions remains up to the satisfiability threshold. In all 
likelihood these individual solutions are well-separated, which may explain the poor performance 
of algorithms. We conjecture that random instances of positive 1-in-fT-SAT are significantly sim- 
pler to solve than those of i^-SAT. This view is partly supported by simulations. Also observe that 
the answer for K-XOR-SAT - a polynomial problem - is exact. 
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APPENDIX A: THE DAVIS-PUTNAM ALGORITHM 

The Davis-Putnam (DP) algorithm J^j], or a variation, is regarded as the most efficient com- 
plete algorithm for satisfiability problems. An outline of the DP algorithm is given in table U 
I2I]. The version we used varies from this outline in one major respect. We perform a sort of 
the variables before the first call to Find_Model, sorting on the number of clauses which use 
the variable. This was found to produce, on average, a very large speed-up in the algorithm's 
execution. 

The unit_propagate step of the algorithm is also extremely efficient for the 1-in-K-SAT 
problem. Once one variable in a clause is set to 1, the value of the other two variables is fixed, 
and extensive propagation often occurs. Also, because a single variable in a clause being set to 
1 determines the other two variables in the clause, we call Find_Model ( theory AND x ) 
first. 
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TABLE I: Outline of the Davis-Putnam Algorithm 

Find_Model ( theory ) 

unit_propagate ( theory ) ; 

if contradiction discovered return (false) ; 
else if all variables are valued return (true ) ; 
else { 

x = some unvalued variable; 

return ( Find_Model ( theory AND x ) OR 

Find_Model ( theory AND NOT x ) ) ; 

} 
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